Translating Transliterations

نویسندگان

  • JÖRG TIEDEMANN
  • PETER NABENDE
چکیده

Translating new entity names is important for improving performance in Natural Language Processing (NLP) applications such as Machine Translation (MT) and Cross Language Information Retrieval (CLIR). Usually, transliteration is used to obtain phonetic equivalents in a target language for a given source language word. However, transliteration across different writing systems often results in different representations for a given source language entity name. In this paper, we address the problem of automatically translating transliterated entity names that originally come from a different writing system. These entity names are often spelled differently in languages using the same writing system. We train and evaluate various models based on finite state technology and Statistical Machine Translation (SMT) for a character-based translation of the transliterated entity names. In particular, we evaluate the models for translation of Russian person names between Dutch and English, and between English and French. From our experiments, the SMT models perform best with consistent improvements compared to a baseline method of copying strings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-Transliteration Alignment

The named-entity phrases in free text represent a formidable challenge to text analysis. Translating a named-entity is important for the task of Cross Language Information Retrieval and Question Answering. However, both tasks are not easy to handle because named-entities found in free text are often not listed in a monolingual or bilingual dictionary. Although it is possible to identify and tra...

متن کامل

Learning to Find English to Chinese Transliterations on the Web

We present a method for learning to find English to Chinese transliterations on the Web. In our approach, proper nouns are expanded into new queries aimed at maximizing the probability of retrieving transliterations from existing search engines. The method involves learning the sublexical relationships between names and their transliterations. At run-time, a given name is automatically extended...

متن کامل

Machine Transliteration

It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These items are commonly transliterated, i.e., replaced with approximate phonetic equivalents. For example, "computer" in English comes out as "konpyuutaa" in Japanese. Translating such items from Japanese back to English is even more challenging, and of practical interest, ...

متن کامل

Transliteration as Constrained Optimization

This paper introduces a new method for identifying named-entity (NE) transliterations in bilingual corpora. Recent works have shown the advantage of discriminative approaches to transliteration: given two strings (ws, wt) in the source and target language, a classifier is trained to determine if wt is the transliteration of ws. This paper shows that the transliteration problem can be formulated...

متن کامل

Lexicon Stratification for Translating Out-of-Vocabulary Words

A language lexicon can be divided into four main strata, depending on origin of words: core vocabulary words, fullyand partiallyassimilated foreign words, and unassimilated foreign words (or transliterations). This paper focuses on translation of fullyand partially-assimilated foreign words, called “borrowed words”. Borrowed words (or loanwords) are content words found in nearly all languages, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009